BananaSet: A dataset of banana varieties in Bangladesh

This article introduces a primary dataset sourced from diverse marketplaces in Bangladesh, encompassing six distinct banana varieties predominantly consumed locally. The dataset comprises the following banana types: Shagor, Shabri, Champa, Anaji, Deshi, and Bichi. High-resolution images of bananas from each category were acquired using a smartphone camera. A total of 1166 images were captured but did not maintain a uniform distribution. Only the augmented data has 1000 images per category which is a total of 6000 images. The proposed dataset exhibits substantial potential for impact and utility, offering a range of attributes, including but not limited to the representation of six diverse banana varieties, each possessing unique flavors and holding promise for various applications within the agriculture and food manufacturing industries.


Value of the Data
• This fresh dataset containing images of Banana fruits can serve as the foundation for creating an automated classification system within the industry, addressing the classification challenge effectively.• It can serve as a critical resource for agricultural research, enabling the identification and classification of distinct banana types, and aiding in crop management.• This dataset has relevance in machine learning, fostering the development of image recognition models and enhancing automated fruit sorting systems.• Farmers and agricultural experts can use it to enhance the precision of banana variety identification and crop management, ultimately improving agricultural yields.• The proposed dataset has the potential to be of great utility to the computer science community, especially within the domains of computer vision, machine learning, and deep learning.
It can be instrumental in the creation of robust banana classification models capable of accurately distinguishing between different banana types.• Such models offer the farming community a valuable resource to optimize their pre-planting processes, saving time and resources involved in cultivating specific banana varieties while mitigating the risks associated with planting the wrong types.

Background
Musaceae or Bananas, known as the "fruit of the wise," are cherished for their tropical sweetness and nutritional value.Rich in potassium and fiber, they provide energy and flavor.Given that bananas can thrive in a variety of nations and are available throughout the year, they have emerged as the fourth most significant climacteric fruit commodity [ 1 ].This makes them suitable for both domestic consumption and exportation [ 2 ].Banana cultivation involves planting shoots, and their adaptability to warm climates sustains countless livelihoods.Beyond their deliciousness, bananas feature various culinary creations and offer substantial nutrition, affirming their significance in both agriculture and culinary traditions worldwide.

Data Description
In our paper, we proposed a dataset that comes with two variations: Raw images folder and Augmented images Folder.Each one is organized into six distinct sub-folders, each corresponding to a specific banana variety.Within the original image folder, there are 1166 images in JPG format.These images are uniformly set and at a resolution of 4608 × 3456 pixels.The high resolution of the images initially resulted in a considerable file size of 4.08 GB.After applying compression through a zip program, the dataset size was further optimized to 3.55 GB.Subsequently, data augmentation techniques were applied, as deep learning models for machine vision necessitate a substantial volume of images.Augmentation was executed by implementing transformations such as scaling, shifting, shearing, zooming, and random rotation.The specific augmentation parameters employed included rotations at angles of 1 °to 40 °, with width, height shift, zoom range, and shear ranges set to 0.2.As a result, from the raw images in each class, we generated an additional 10 0 0 augmented images, culminating in a dataset comprising a total of 60 0 0 augmented images (10 0 0 per class).Within the augmented image folder, there are 60 0 0 images in JPG format.These images are uniformly set and at a resolution of 4608 × 3456 pixels where the main size is 4.73 GB but after compression, this was optimized to 4.46 GB after compression.
The process of dataset generation is visually depicted in Fig. 2 , while Fig. 3 provides a visual representation of augmented images derived from an original sample image.The banana dataset is conveniently accessible through the Mendeley repository [ 3 ], where it is stored in two distinct zip files, namely ' Augmented_Images.zip' and ' Original_Images.zip'.
Both zip files contain six folders representing a unique banana variety, namely Shagor, Shabri, Champa, Anaji, Deshi, and Bichi .Some of these banana types have official names such as, Shagor is BARI-1, Anaji is BARI-2 , and Deshi is BARI-4 .[ 4 ].Fig. 1 is a visual representation of our raw and augmented dataset structure.
This dataset has the potential to act as a state-of-the-art guide for the development of machine vision algorithms designed to classify various banana types within the agricultural domain.

Experimental Design, Materials and Methods
The process of acquiring images for each type of banana adhered to a specific workflow, which is visualized in Fig. 4 .In this method, the selection of individual bananas from each type of banana population followed a uniform distribution, ensuring that each banana type had an equal probability of being selected.This approach was instrumental in maintaining a diverse representation of data.To achieve this diversity, a random selection process was  employed.Throughout the data collection phase, several challenges were encountered.Given that the bananas were sourced from naturally growing trees and local markets, a notable portion exhibited various stages of decay, partial consumption by bats or worms, or damage, posing potential obstacles to the identification process.Consequently, bananas displaying deformities were intentionally excluded from the dataset.Each banana chosen for photography was selected randomly from within a cluster of bananas.After the images of each banana type were captured, they were subsequently transferred from the smartphone's memory to an external hard drive.These images were then organized within folders, each folder bearing the name of the corresponding banana type.To label the dataset correctly local experts were involved along with a domain expert.After confirmation, the local names were used for labeling the data as  local names are more popular than the official names.The process of acquiring images for the subsequent banana type commenced once the images of the previously captured banana type were removed from the smartphone's memory.This systematic approach continued until images for all categories of bananas were obtained, ranging from raw to ripe, ensuring a comprehensive representation of the dataset.Table 1 shows the statistical view of our dataset.

Model validation
We present a traditional deep learning model for the efficient training of the dataset, resulting in cutting-edge outcomes.The validation of this deep learning model encompasses an evaluation of its performance using the dataset.This process encompasses several stages, including data preprocessing, data partitioning, model training, performance assessment using a validation set, and testing the model's performance on an entirely distinct test set.The data was split into 80:20 ratio; 80 % for training and 20 % for testing.This systematic approach is integral in ensuring the model's ability to generalize to new data and produce dependable, accurate

Limitations
None.

Ethics Statement
This article does not involve any research involving human or animal subjects by any of the authors.The datasets utilized in this article are publicly accessible.When utilizing these datasets, it is essential to adhere to appropriate citation guidelines.

Fig. 2 .
Fig. 2. Examples of each of the Banana class.

Table 1
Statistics of the banana varieties dataset.

Table 2
Represents the VIVO V25 5G smartphone camera.